Web page language identification based on URLs
نویسندگان
چکیده
منابع مشابه
Web page language identification based on URLs
Given only the URL of a web page, can we identify its language? This is the question that we examine in this paper. Such a language classifier is, for example, useful for crawlers of web search engines, which frequently try to satisfy certain language quotas. To determine the language of uncrawled web pages, they have to download the page, which might be wasteful, if the page is not in the desi...
متن کاملOntology based Web Page Topic Identification
With the emergence of the web, lots of research efforts are made in the area of Web Mining. This paper proposes an automatic approach for automatic topic identification from the web pages. The contribution of this research is in the approach of automatic topic identification of web pages that can provide better results. The topic of the web documents is identified through ontological approach.
متن کاملImproving Language Identification of Web Page Using Optimum Profile
Language is an indispensable tool for human communication, and presently, the language that dominates the Internet is English. Language identification is the process of determining a predetermined language automatically from a given content (e.g. The ability to identify other languages in relation to English is highly desirable. It is the goal of this research to improve the method used to achi...
متن کاملURL-Based Web Page Classification: With n-Gram Language Models
There are some situations these days in which it is important to have an efficient and reliable classification of a web-page from the information contained in the Uniform Resource Locator (URL) only, without the need to visit the page itself. For example, a social media website may need to quickly identify status updates linking to malicious websites to block them. The URL is very concise, and ...
متن کاملPhishing Detection based on Web Page Similarity
Phishing is a current social engineering attack that results in online identity theft. Phishing Web pages generally use similar page layouts, styles (font families, sizes, and so on), key regions, and blocks to mimic genuine pages in an effort to convince Internet users to divulge personal information, such as bank account numbers and passwords. A novel technique to visually compare an assumed ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Proceedings of the VLDB Endowment
سال: 2008
ISSN: 2150-8097
DOI: 10.14778/1453856.1453880